Main Questions
Sudden Fluctuation in Flow
From the instructions (General Instructions.pdf):
Sudden fluctuations are changes in flow by 150% or above within a 12 hour period.
Since New Hudson is measured in daily intervals, we’ll do this with the other two first.
# 12 hours prior should be 12*4=48 points prior due to 15-minute interval
avg_flow_tbl <- dam_tbl %>%
filter(site != "New Hudson") %>%
group_by(site) %>%
arrange(datetime) %>%
mutate(
lag = lag(flow, 1L),
past_avg_flow = roll_meanr(lag, 48, na.rm = TRUE),
past_missing = roll_sumr(is.na(lag), 48)
) %>%
ungroup() %>%
select(-lag) %>%
filter(!is.na(past_missing)) %>% # remove early entires without history
mutate(pct_change = (flow - past_avg_flow) / past_avg_flow)
avg_flow_tbl %>%
group_by(site) %>%
slice(c(1:3, (n() - 2):n())) %>%
kable()
| site | datetime | flow | past_avg_flow | past_missing | pct_change |
|---|---|---|---|---|---|
| Hamburg | 1990-01-03 11:45:00 | 208 | 211.0638 | 1 | -0.0145161 |
| Hamburg | 1990-01-03 12:00:00 | 210 | 211.0000 | 0 | -0.0047393 |
| Hamburg | 1990-01-03 12:15:00 | 210 | 210.9583 | 0 | -0.0045428 |
| Hamburg | 2017-11-01 16:45:00 | 163 | 162.7292 | 0 | 0.0016643 |
| Hamburg | 2017-11-01 17:00:00 | 163 | 162.7292 | 0 | 0.0016643 |
| Hamburg | 2017-11-01 17:15:00 | 163 | 162.7292 | 0 | 0.0016643 |
| Wall Street | 1990-01-01 11:45:00 | 353 | 345.9362 | 1 | 0.0204195 |
| Wall Street | 1990-01-01 12:00:00 | 363 | 346.0833 | 0 | 0.0488803 |
| Wall Street | 1990-01-01 12:15:00 | 363 | 346.1875 | 0 | 0.0485647 |
| Wall Street | 2017-11-01 16:45:00 | 372 | 380.8542 | 0 | -0.0232482 |
| Wall Street | 2017-11-01 17:00:00 | 372 | 380.7500 | 0 | -0.0229810 |
| Wall Street | 2017-11-01 17:15:00 | 372 | 380.6458 | 0 | -0.0227136 |
How many of our samples are more than 150% or 100% higher than the past 12-hour average?
avg_flow_tbl %>%
group_by(site) %>%
summarize(
n_over_150 = sum(pct_change >= 1.5, na.rm = TRUE),
pct_over_150 = n_over_150 / n(),
n_over_100 = sum(pct_change >= 1.0, na.rm = TRUE),
pct_over_100 = n_over_100 / n()
) %>%
kable()
| site | n_over_150 | pct_over_150 | n_over_100 | pct_over_100 |
|---|---|---|---|---|
| Hamburg | 0 | 0.0000000 | 0 | 0.0000000 |
| Wall Street | 1768 | 0.0018116 | 4022 | 0.0041211 |
So we see there are no cases at the Hamburg site where flow was above 200% of the trailing 12-hour average, but there are a handful of such cases at the Wall Street site.
New Hudson
We can’t do quite the same calculation for New Hudson since it’s flow is only available per-day, but we can see which points are above the thresholds when compared to the prior days’ flow. This is a lot simpler to compute, and we also don’t have to sweat missing data.
new_hudson_tbl <- dam_tbl %>%
filter(site == "New Hudson") %>%
arrange(datetime) %>%
mutate(
past_avg_flow = lag(flow, 1L),
pct_change = (flow - past_avg_flow) / past_avg_flow
) %>%
filter(!is.na(past_avg_flow))
new_hudson_tbl %>%
summarize(
n_over_150 = sum(pct_change >= 1.5),
pct_over_150 = n_over_150 / n(),
n_over_100 = sum(pct_change >= 1.0),
pct_over_100 = n_over_100 / n()
) %>%
kable()
| n_over_150 | pct_over_150 | n_over_100 | pct_over_100 |
|---|---|---|---|
| 5 | 0.0005532 | 18 | 0.0019916 |
So we see just a few days have a flow over the the thresholds compared to the previous day; might as well pull them all.
new_hudson_tbl %>%
filter(pct_change > 1.0) %>%
arrange(desc(pct_change)) %>%
kable()
| site | datetime | flow | past_avg_flow | pct_change |
|---|---|---|---|---|
| New Hudson | 1990-09-07 | 151.0 | 50.0 | 2.020000 |
| New Hudson | 2001-04-14 | 66.0 | 24.0 | 1.750000 |
| New Hudson | 2011-07-28 | 121.0 | 46.1 | 1.624729 |
| New Hudson | 2007-08-20 | 79.0 | 30.9 | 1.556634 |
| New Hudson | 2012-08-10 | 110.0 | 43.9 | 1.505695 |
| New Hudson | 1999-11-08 | 162.0 | 65.0 | 1.492308 |
| New Hudson | 2005-11-14 | 104.0 | 42.9 | 1.424242 |
| New Hudson | 1996-04-21 | 89.0 | 38.0 | 1.342105 |
| New Hudson | 2008-09-14 | 262.0 | 112.0 | 1.339286 |
| New Hudson | 1998-11-02 | 138.0 | 59.0 | 1.338983 |
| New Hudson | 2005-11-01 | 142.0 | 63.0 | 1.253968 |
| New Hudson | 2000-04-20 | 58.0 | 27.0 | 1.148148 |
| New Hudson | 2007-07-27 | 55.7 | 26.0 | 1.142308 |
| New Hudson | 2001-07-30 | 102.0 | 48.0 | 1.125000 |
| New Hudson | 2006-09-13 | 88.9 | 43.2 | 1.057870 |
| New Hudson | 2004-12-22 | 128.0 | 63.2 | 1.025316 |
| New Hudson | 2003-10-31 | 157.0 | 78.4 | 1.002551 |
Plotting them might help show if this is a solved problem, or if these threshold-breaking increases still happen.
avg_flow_tbl <- bind_rows(new_hudson_tbl, avg_flow_tbl)
avg_flow_tbl %>%
filter(!is.na(pct_change)) %>%
ggplot(aes(x = datetime, y = pct_change, color = site)) +
facet_wrap(~site, ncol = 1, scales = "free_y") +
geom_line() +
geom_hline(yintercept = c(1, 1.5), linetype = 2) +
labs(
x = "Date",
y = "Change in Flow",
title = "Change in River Flow Over Time"
) +
theme(legend.position = "none") +
scale_y_continuous(labels = scales::percent)
Flow Targets
The USGS sets targets for a minimum level of flow beyond each dam. How often do we dip below those?
# manually extracted from "Example Numeric Flow Targets.docx"
flow_targets <- read_csv("../Huron River Watershed Council/Data/flow_targets.csv", col_types = "ciiii") %>%
mutate(site = factor(site, levels = unique(site), labels = levels(dam_tbl$site)))
kable(flow_targets)
| site | base_flow | min_flow | spring_low_flow | spring_high_flow |
|---|---|---|---|---|
| New Hudson | 58 | 46 | 60 | 284 |
| Hamburg | 154 | 124 | 155 | 651 |
| Wall Street | 272 | 218 | 307 | 1665 |
We can count how often we’re below the minimums
dam_tbl %>%
left_join(flow_targets, by = "site") %>%
filter(!is.na(flow)) %>%
group_by(site) %>%
summarize(
n = n(),
n_low_flow = sum(flow < min_flow),
pct_low_flow = n_low_flow / n
) %>%
kable()
| site | n | n_low_flow | pct_low_flow |
|---|---|---|---|
| New Hudson | 9039 | 783 | 0.0866246 |
| Hamburg | 871618 | 169229 | 0.1941550 |
| Wall Street | 959972 | 190867 | 0.1988256 |
So we see that most dams tend to experience some time below the threshold. The Wall Street site in particular spends a lot of time underneath the prescribed minimum flow.
Let’s plot our flows again, with the minimum flow limits superimposed.
temp <- dam_tbl %>%
mutate(
year = year(datetime),
new_datetime = set_year(datetime, max(year))
) %>%
filter(complete.cases(.))
targets_long <- flow_targets %>%
gather(limit, flow, -site)
limits <- targets_long %>%
filter(limit %in% c("min_flow")) %>%
mutate(new_datetime = min(temp$new_datetime))
temp %>%
ggplot(aes(x = new_datetime, y = flow, color = site)) +
facet_wrap(~site, ncol = 1, scales = "free_y") +
geom_line(aes(group = year, alpha = year)) +
geom_line(data = . %>% group_by(site, new_datetime) %>% summarize(flow = mean(flow, na.rm = TRUE)), size = 1.5) +
geom_hline(data = limits, aes(yintercept = flow), linetype = 2) +
geom_text(
data = limits,
aes(x = new_datetime, y = flow, label = flow),
vjust = -1, hjust = 1.1, color = "black", size = 3
) +
scale_x_datetime(date_breaks = "1 month", date_labels = "%b") +
labs(
x = "Day of Year",
y = bquote('Flow ('*ft^3 / s*')'),
alpha = "Year",
title = "Flow by Site And Year",
subtitle = "with minimum flow guidelines (dotted line)"
) +
guides(color = FALSE)
Spring Storm Season
We can also focus on the spring storm season (April 15 - June 30). What percent of the time are we outside the prescribed limits?
spring_only <- temp %>%
filter(
new_datetime >= as.POSIXct(paste0(unique(year(new_datetime)), "-04-15"), tz = "UTC"),
new_datetime < as.POSIXct(paste0(unique(year(new_datetime)), "-07-01"), tz = "UTC")
) %>%
left_join(flow_targets, by = "site")
spring_only %>%
select(site, flow, spring_low_flow, spring_high_flow) %>%
filter(!is.na(flow)) %>%
group_by(site) %>%
summarize(
n = n(),
n_outside_limits = sum(pmap_dbl(list(flow, spring_low_flow, spring_high_flow), ~ !between(..1, ..2, ..3))),
pct_outside = n_outside_limits / n
) %>%
kable()
| site | n | n_outside_limits | pct_outside |
|---|---|---|---|
| New Hudson | 1925 | 298 | 0.1548052 |
| Hamburg | 202653 | 38912 | 0.1920129 |
| Wall Street | 205344 | 45126 | 0.2197581 |
Again, plenty of operation outside the prescribed limits. We can see this visually by plotting flow once again and imposing the spring-specific guideline on top.
limits <- targets_long %>%
filter(limit %in% grep("spring", limit, value = TRUE)) %>%
mutate(new_datetime = min(spring_only$new_datetime))
spring_only %>%
ggplot(aes(x = new_datetime, y = flow, color = site)) +
facet_wrap(~site, ncol = 1, scales = "free_y") +
geom_line(aes(group = year, alpha = year)) +
geom_line(data = . %>% group_by(site, new_datetime) %>% summarize(flow = mean(flow, na.rm = TRUE)), size = 1.5) +
geom_hline(
data = limits,
aes(yintercept = flow),
linetype = 2
) +
geom_text(
data = limits,
aes(x = new_datetime, y = flow, label = flow),
vjust = -1, hjust = 1.1, color = "black", size = 3
) +
scale_x_datetime(date_breaks = "1 month", date_labels = "%B") +
labs(
x = "Date",
y = bquote('Flow ('*ft^3 / s*')'),
alpha = "Year",
title = "Spring Flow by Site and Year",
subtitle = "with flow guidelines (dotted lines)"
) +
guides(color = FALSE)